Using Machine Learning Techniques for Subjectivity Analysis based on Lexical and Non-lexical Features

نویسندگان

  • Hikmat Ullah Khan
  • Ali Daud
چکیده

Machine learning techniques have been used to address various problems and classification of documents is one of the main applications of such techniques. Opinion mining has emerged as an active research domain due to its wide range of applications such as multi-document summarization, opinion mining of documents and users’ reviews analysis improving answers of opinion questions in forums. Existing works classify the documents using lexicon-based features only. In this work, four state of the art machine learning techniques have been applied to classify the content into subjective and objective. The subjective content contains opinionative information while objective content contains factual information. The main contribution lies in the introduction of non-lexical features and content based features in addition to the use of a conventional lexicon based feature set. We compare results of four machine learning techniques and discuss performance in diverse categories of lexical and non-lexical features. The comparative analysis has been accomplished using standard performance evaluation measures and experiments have been performed on a real-world dataset of the online forum related to diverse topics. It has been proven that proposed content and non-lexical thread specific features play their role in the classification of subjective and non-subjective content.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Detecting of Persian Rumors based on the Analysis of Contextual Features in the Content of Social Networks

The rumor is a collective attempt to interpret a vague but attractive situation by using the power of words. Therefore, identifying the rumor language can be helpful in identifying it. The previous research has focused more on the contextual information to reply tweets and less on the content features of the original rumor to address the rumor detection problem. Most of the studies have been in...

متن کامل

Comprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features

Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...

متن کامل

Sentence-Level Subjectivity Detection Using Neuro-Fuzzy Models

In this work, we attempt to detect sentencelevel subjectivity by means of two supervised machine learning approaches: a Fuzzy Control System and Adaptive Neuro-Fuzzy Inference System. Even though these methods are popular in pattern recognition, they have not been thoroughly investigated for subjectivity analysis. We present a novel “Pruned ICF Weighting Coefficient,” which improves the accurac...

متن کامل

The Use of Lexical Bundles in Native and Non-native Post-graduate Writing: The Case of Applied Linguistics MA Theses

Connor et al. (2008) mention “specifying textual requirements of genres” (p.12) as one of the reasons which have motivated researchers in the analysis of writing. Members of each genre should be able to produce and retrieve these textual requirements appropriately to be considered communicatively proficient. One of the textual requirements of genres is regularities of specific forms and content...

متن کامل

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017